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Let  e(x,y)  denote  an  arbitrarily  complex  (nonatomic)  predicate,  and 
let  e(X,y)  denote  the  special  form  that  this  prediate  will  assume  when  the 
initial  multi-component  variable  of  x  is  replaced  by  the  fixed  tuple  of  X. 
This  thesis  will  study  the  question  of  how  the  set  of  y-records  that  satisfy 
the  condition  of  e(X,y)  can  be  efficiently  ^retrieved.  Such  retrievals  will 
be  examined  in  the  context  of  a  "dynamic"  environment  where  new  records  are 
being  continually,  inserted,  deleted,  and  updated.  This-  theisis  will  show  that 
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every  conceivable  predicate  of  e(x,y)  can  be  assigned  a  database  and 
a  number  of  D(e)  such  that 

i)  For  any  element  of  x,  the  set  of  y-records  that  satisfy 
e{x,y)  can  be  retrieved  in  0(logD^e^N)  time  (from  the 
initial  N-membered  databcse) . 

ii)  The  relevant  database  will  be;  sufficiently  flexible  so  that 
the  user  can  insert,  delete,  or  modify  any  of  its  y-records 
in  0(logD(e)N)  time. 

The  above  results  will  be  extremely  significant  because  the  number  of 
D(e^will  usually  assume  small  values  such  as  zero,  one,  or  two. 

The  principal  application  of  this  thesis  will  be  in  the  area  of 
automatic  programming.  The  purpose  of  that  branch  of  computer  science 
has  been  to  discover  how  automatic  algorithms  can  be  developed  which  do 
much  of  the  programming  that  has  traditionally  been  assigned  to  human 
beings. 

Such  automatic  algorithms  have  been  advocated  by  many  computer 
scientists  *.(Codd-70,  Boehm-72,  Date-77)  because  these  procedures  would 
dramatically  reduce  the  cost  of  writing  computer  programs.  The  combined 
work  of  the  cited  authors  have  shown  that^‘ 

/i)  the  cost  of  developing  computer  software  may  greatly  exceed 

hardware  costs  in  the  1980 's  (Boehm  has  estimated  that  computer 
programmer  labor  costs  will  constitute  90  percent  of  all  the 
Air  Force's  1985  computer-related  expenditures), 

ii)  and  that  the  health  of  the  computer  industry  requires  lower 

software  development  costs  (even  if  this  is  done  in  the  context 
of  a  trade-off  that  modestly  increases  the  hardware  costs) . 

The  importance  of  automatic  database  search  algorithms  was  further  confirmed 
in  a  recent  panel  discussion .  (Oadd  ■7-S)  '.  The  members  of  that  panel  con¬ 
cluded  that  such  automatic  search  algorithms  would  be  extremely  useful 
if  these  algorithms  could  be  made  to  be  moderately  efficient.  This  thesis 
will  lay  the  foundations  of  the  theory  that  should  be  used  in  the  develop¬ 
ment  of  authomatic  predicate  searching  algorithms. 
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no  proof.  All  "Observations"  In  this  thesis  sre  given  Identification  'Itte  concc|«  of  s  stiper-IHrec  will  he  extremely  Important. 


Dan  Willarrt‘8  Doctoral  Thesis 

Dan  Willard  s  Doctoral  Thesis  Harvard  Mathematics  Department 

Harvard  Mathematics  Department 


|Mh  ease  the  t  Ik.*  si  a  •hould  K»  read  In  two  c'agca.  In  the  first  three  sections  have  Is.'cn  examined. 
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previous  research  Into  the  ciilcicncy  oi  relational  imiamiac-  "i"" 

Cndri'n  work  has  generated  a  great  deal  of  Intercut  In  the  computer  gjvl.n  p,  up-77.  )n  the  third  paragrajih  of  that  article,  IBaaagan  and 

science  community.  11m;  magnitude  of  this  Interest  can  be  seen  upon  examination  Kawarun  state  tliot  "little"  haa  been  published  ahont  fin’  subject  of  efficiency. 
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In  the  prcvlwii  relational  literature.  Moil  previous  measurements 
of  runtime  only  held  for  certain  predicates  and  fur  uulfurm 
profuhllfty  diatrllnitlonH  «'f  ihl a.  In  contrast.  the  "worst-caw  hash" 


•election  command  ha«  been  defined  to  be  •  primitive  that  requests  the  subset 


latter  article  diseased  this  topic  In  the  greatest  generality.  Tlie  .Mails  discussed  optimization  In  a  one  and  two  variable  setting  whereas  the  QIWL  iwpera 

U'asagan  and  liawarun'a  work  Is  therefore  desc.lbcd  Iwlow.  h,ve  consl<ler«d  «H*«n>»*stlona  that  Involve  requests  with  a  larger  iiumler  ol 


jirot'irypc  U*M  W’i’d  support  the  lnycu^c.  Ilir  rMcn*  <>(  t!.li  cflcct 
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t  again.  It  should  be  stated  that  the  difficulty  In  query  i  *  arbitrarily  mathematical  foundation  that  will  be  needed  bv  o  future  relational  database 


SOS  node  e  wttl  generally  be  eesumed  to  £o*H»tn_aii^  Tr.dftlnn.l  B-trec.  wtll  no,  ope™, e  efficiently  If.n  SIX S  field  I. 
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'I1»e  for.nal  description  o'  the  Kiipcf  -B-tree  nlgoilthin  *A'iU  ho  presented 
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i  he  symbol  ot  p(  v)  will  henceforth  denote  that  quotient  which  is  N  /N  ratios  are  greater  chan  a  (or  every  node,  v  ,  of  these  trees. 
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ancestor  uodee  whose  p(  v)  value  lies  outside  the  required 
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In  this  chapter,  the  small  letter  symbol  of  ”c  *'  will  denote  some 
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deletion  command  to  consume  more  than  r  time.  Also,  note  tlut,  CKRT  us  further  assume  that  (C^l  <<  Ic^l  and  In  this  case,  It 


Ran  WUlanT.  Doctoral  Thole  Don  wl)Ur(J.e  noctn.nl  Thesis 

llan-crd  Mathematics  Department  64  Ksiv.nl  Mathematics  Department 


l 

! 


o 

E 

i 

e 

H 

AC 

« 

a 


€ 


« 

* 

o 


* 

S 


* 

e 


£ 

fid 

U 


& 


*6 

c 


e 


c/5 


Q 


V) 


V9 

J 

v: 

a 


« 


cn 


.r. 


9) 

V 

c 


E 


5 

'3 

t 

*» 


5 

e 

2L 


ih^t  the  CER  T  or  SCERT  ineaeurimentti  of  the  runtimes  of  these  algorithms 


by  a  more  uniform  probability  distribution.  Realistic  examples 
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help  prove  this  section' ■  mein  theorem. 
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eiy  closely  rcldled  to  'nieorem  3.  2.  S  <of  the  lea:  aectlon).  That  theorem  stated 
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consequently  prefer  ro  omit  the  remaining  section  a  of  this  chapter  on  their  first 
rending  of  tills  thesis.  The  thesis  subject  matter  has  been  organized  so  tliat  the 
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us  begin  by  examining  node  v  .  Note  Diagram  3.  2.  E  showed  that  all  y -leaves 
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.Tagraph.  That  definition  has  been  c.nefully  designed  to  assure  that  all 
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these  parameters.  The  neat  several  paragraphs  will  offer  s  brief  nummary  of  Ihe  last  parameter  of  the  Alg(o,  K  ,  J)  procedure  will  be 
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Mn  .lly,  tlmtmmh  step  *111  take  step  3's  SOS  fields  and  Instruct  module  U  a  super-3-tree  whose  data  -etmeture 


Ohservathin  \S._K.  Let  us  retail  that  part  3.4.C,  ol  this  thapter  gave  the  w,,,c!’  represents  this  sit  of  lo  ves,  v  denotes  a  current  Inr  -Ho  -  mule  „f 

(le'I"lllon  of  j  data -structure  which  In  a  snper-ll-trec  "rcprcncma'.loo"  of  a  T.  <"«'  «  denotes  one  of  the  lt|  ,  ,  p.,-  py  rotation*  (w|,u  |, 

specified  vt  of  leaves.  let  S(|  denote  an  Initial  ser  of  leaves,  T  denote  were  previously  Illustrated  hi  l)UKi«ms  3. 1. 1)  ihteuKi,  3.  2.C).  Note  that  the 
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occasion  when  this  subroutine  Is  Invoked. 
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rroof:  Note  Ilia?  the  proceeding  lemma  Indicated  dial  llw  p<v)  values  of  all 


symbol  of  |  r  I  will  denote  the  lcat-t  Integer  that  In  greater  than  or  equal  to 

hypothesis  placed  on  sequence  C)  .  Also  the  hypothesis  Implies  that  this  same 

r  .  TIiH  notation  will  be  used  In  the  mat  several  lemmas. 

change  must  be  at  least  as  large  as  -  r  ^  .  Furthermore,  the  definition  of 
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There  are  many  cxentpiCK  of  n  «  tu I  applications  of  super-8 -tree*  . 


«et  that  were  previously  llluat rated  in  the  familiar  Diagram  3.  2.  B 


•Hie  dlacucslon  In  U.la  Knurl  clearly  contained 
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itc-fmcr-rce  li.’o  itie  (rl/J«l  p>occ<fiirt  which  •  onllmi  illy  •  “■•.ire*  tl:' »  8»ncwl«je 


<<*,,)  will  he  Mid  to  be  an  E-2  pn-dicie  If  |t  la  either  an  E-l  predicate  *"  K**  prcdlc,,<'  wll>  »*  to  have  a  complcalty  of  m  If  It 
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lemma  4.  4. C  stuted  that  all  B-2  expressions  (Including  the  above 
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to  the  SJM-1  algorithm  and  Instruct  this  procedure  to  use  the  theorem  describing  It#  correctness  tml  un  example. 


i.'.ser  matrucTH  the  compiler  to  develop  the  capacity  for  handling  predicate 


Dan  Willard'*  Doctoral  Thesis  Dan  Willard'*  Doctoral  Thetis 

Harvard  Mathematics  Department  184  Harvard  Mathematics  Department 
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ilKJicd.  Tlv  produced  database  will  I*-  xu.truntccO  to  insure  that  N*h  modules 


to  pivclw  ly  K  .  Thun,  equation  3  la  on  example  of  a  NBC*- 1-3 


Dourly  .III  (lie  c(  pivdleMCK  have  ■  complexity  Lipiul  to  K-l  (Hlncc 


evident.  Tlie  flret  term  refer*  to  ■  physically  constructed  data -atructure.  The 
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,  In  err  Jfl.m  1.  Note  that  U»l«  repression  had  a  complexity  to  K  .  The 
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algorithm.  The  proof  will  be  divided  Into  two  parts  where  Proposition.  I  and  II 

ar>  Separately  confirmed.  We  now  begin  the  proof  with  rropoa.tlon  1.  P^’ofofi^gpoaltlon  II.  the  proof  will  be  divided  Into  three  part,  where  the 

runtlmra  of  the  three  atepa  of  die  PIND-l-K  algorithm  are  aepar.tely  examined. 
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that  example,  11  will  perform  Its  task  In  optimal  runtime  magnitudes. 
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jr- record  should  either  be  Inserted  or  deleted  In  the  R  relations. 
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I*  deallocated  and  returned  to  die  ay wem -gu rl>«ge -collector. 
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perform  all  four  of  Its  needed  tasks. 
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lime.  Alno  note  that  l  emma  5.  3.  C  Imllcut.  d  that  there  will  each  patch.  The  Inductive  hypothesis  will  allow  ua  lo  presume 
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algorithms.  We  will  see  that  these  algorithms  will  generally  make  subroutine’ 
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tin*  like  ly  ami  com  of  computer  memory  ami  of  Insertion, 
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will  Ur  c Uic  respective  x  and  y-c<|iiallly  at! filmic*  of  (hat  expression. 
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will  depend  on  whether  or  not  8'Kt)  a 
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will  deallocate  its  storage  apace  and  disband  the  corresponding 
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will  be  located  by  this  hash.  Thot  pointer  contains  the  address  of  the  0  .(21) 
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us  usaumr  thut  3  denntea  ira  decomposition  Into  I** -2  parts. 


The  remaining  proof  follows  mm  two  simple  f.ict,  'Ihe  first  Is  i’ui 
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expressions."  Section  6. 7  will  discuss  the  further  technique*  that  are  needed 
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condition.  The  latter  symbol  will  presume  that  v  denotes  an  Interior  node  of 
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l-  two  part  (atfuctlve  definition  of  thin  concept  I*  given  below: 
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pa  u  graph*. 

flip'll Ithma  will  produce  dutaluaca  for  u  given  E-3  predicate  that  contain 

common  data- substructure*.  Note  Dint  the  l^-Sfe,  R^)  data  -linage  was  Throughout  thl»  chapter,  the  *ytnliOl  of  "Q^ -K(e ,  lt^)"  will  be  defined 

only  when  e(x,y)  belong*  tu  die  B-K  clan*  of  caprcaalona  (for 
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eltlier  an  E-3  exprcaalon  or  a  conjunction  of  oeveral  range-terms  with  an 
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y.b  )  AND  x.a  <  y.b  ] I  difference  between  these  proposition*  Is  the  notation  which  they  use. 
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lilt  SUM-4  algorithm  will  caecute  Slrpe  1  through  3  when  it  la 


,,WN)  time  (where  the  I  (x)  term  ta  qualified  SllM-4  ,  COUNT-4  and  I'IND-4  algorithm  jre  Importonl  because  thi  y  will 
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llflcatlon  time.  These  changes  are  Important  because  the  higher -level 
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iarly  this  terminology  Implies  that 


OOMi’ARE-COUNT  elgortthm  will  produce  tl>c  proceeding 
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